home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Monster Media 1996 #15
/
Monster Media Number 15 (Monster Media)(July 1996).ISO
/
os2
/
pms_103.zip
/
PMSTRIP.DOC
< prev
next >
Wrap
Text File
|
1996-04-25
|
12KB
|
327 lines
PMStripper 1.03
I. Overview:
This PM shareware utility strips HTML codes from Web pages,
leaving only the text. Some of the page's formatting is
retained, but since PMStripper is not an HTML interpreter most
formatting is lost. While the layout of tables and lists is lost
during stripping, data is sorted to separate lines for
legibility.
PMStripper is designed to provide a quick conversion of HTM/HTML
coded files into plain ASCII text. Although the converted files
can be edited while loaded in PMStripper, only simple edit
commands are available. Therefore, if extensive editing is
needed, the text should be loaded into a more capable word
processor or text editor.
One use of PMStripper would be to convert a Web page so that the
a spell checker can be used without adding all of the HTML codes
and links to the spelling dictionary.
The registered version offers a menu item to easily move stripped
files to programs suited for advanced editing.
II. Installing PMStripper:
1) Unzip the archive.
2) If REXX is installed: Run the INSTALL.CMD script from an OS/2
command prompt, or by double clicking on the install file's icon.
The script will create a destination directory and transfer
program files to it. Optionally, you may use the unzip directory
as the working directory. In either case the script will create
a PMStripper program object on the desktop and set file
associations for .HTM and .HTML files. Setting associations this
way allows instant loading, and stripping, of saved web pages by
double clicking their icons.
If the install program cannot create the desired directory, just
move all unzipped files to the working directory before running
the install program.
3) If REXX is not installed: Unzip the archive in the desired
working directory and manually: a) Create a desktop program
object, and b) Set .HTM and HTML associations. (See OS/2
documentation for instructions, if needed.)
III. Files
PMStripper is distributed in the compressed archive PMSR_xxx.zip,
where xxx is the version number. The archive contains these
files:
NAME SIZE DESCRIPTION
FILE_ID.DIZ 434 File descrption for BBS use.
INSTALL.CMD 2326 Install script.
LICENSE.TXT 4465 License.
LICENSE.UNH 4476 License for UNH
ORDER.BMT 3708 BMT Micro order form
PMSTRIP.DOC 11831 This file.
PMSTRIP.EXE 452608 Program executable.
PMSTRIPR.ICO 874 Program icon.
PMSTRIPB.ICO 874 Program icon.
README.UNH 1503 Information file for unh.exe.
TIPS 1243 Tips on using PMStripper
UNH.EXE 41984 Command line stripper.
IV. Uninstalling PMStripper:
If you find it necessary to remove PMStripper, simply delete the
unzipped files, program object, associations and directory.
PMStripper makes no entries in configuration or initialization
files.
V. Using PMStripper
PMStripper is a simple program with only five menu bar items:
1. 'File' offers three pull-down menu item: 'Open File', 'Save As'
and 'Exit'. Each perform in a standard OS/2 manner. Picking a
saving file name is easy: Highlight some text for the name and
then click on 'Save As', or simply highlight and Alt+S.
The 'Open File' selection can also be used to reload the HTML
file if you make a change in the processing options.
The utility will also load HTML coded files for stripping via
drag and drop of the file's icon onto that of the PMStripper.
However, the capability to load files by drag and drop onto an
open edit window is a potential enhancement, for a future
version.
2. 'Edit' has five sub-menu items which also operate as expected.
They are 'Cut', 'Copy', 'Paste', 'Select All' and 'Undo Change'.
The 'Undo Change' selection will undo the last change made to
the text in the window and is only one level deep.
3. 'Options' has five sub-menu items. They are 'Display Options',
'URL Settings', 'External Editor Settings', 'Filename Settings'
and 'Save Settings'.
'Display Options' has two sub-menu items. They are 'Font' and
'Word Wrap'. 'Font' brings up a standard OS/2 font dialog box
and will allow the selection of any of the installed fonts.
'Word Wrap' is a toggle setting that turns word wrap on or off.
The wrap function does not actually reformat the text, instead
it effects only the way text is displayed.
'URL Settings' has two sub-menu items. They are 'Add URLs' and
'Leave URLs'. These options effect how the HTML file is processed
and the file must be reloaded for these changes to effect current
file. 'Add URLs' appends the URLs found in the HTML file to the
end of the stripped text. 'Leave URLs' leaves the URLs found in
the HTML file in the stripped text.
'External Editor Settings' has two sub-menu items. They are
'Use __TMP2__ File' and 'Use Clipboard'. 'Use __TMP2__ File'
causes the temporary file __TMP2__ to be left in the working
directory for use by an external editor. 'Use Clipboard'
causes the stripped file to be copied to the OS/2 clipboard
when the user selects 'Exit to Word Processor'. These
option settings are only effective in the registered version.
'Filename Settings' has two sub-menu items. They are
'Replace Space with Underscore Character' and
'Leave Space in Filename'. These settings are used to
determine how the highlighted text is converted to a
destination file name for the stripped HTML file. These
option settings are only effective in the registered version.
'Save Settings' saves all of the option settings to an INI
file named PMSTRIP.INI which is located in the working
directory. The display options are not part of the saved
settings and the utility reverts to word wrap on and the
default font when loaded.
4. 'Exit' has two sub-menu items. They are 'Exit' and
'Exit to ~Word Processor'. 'Exit' causes the stripped
file to be discarded and PMStripper to close.
'Exit to ~Word Processor' causes the OS/2 CMD file
PMS_CMD.CMD to be executed and PMStripper to close. The
'Exit to ~Word Processor' option is only effective in the
registered version.
5. 'About' displays copyright and contact information.
VI. The active keyboard accelerators (short cut keys) are:
Exit Alt+X
Copy Ctrl+Insert
Cut Shift+Delete
Select All Ctrl+/
Open File Alt+F
Paste Shift+Insert
Save As Alt+S
Word Processor Alt+W
Undo Change Alt+U
The keyboard accelerators are not case sensitive.
VII. Miscellaneous Notes:
When dragging a file from WebExplorer the file must be dropped on
the desktop (or in a folder) before it can be dropped on the
PMStripper program object.
This utility will only run on OS/2 Warp and later releases.
One useful feature is the ability to mark text in the stripped file
and use the highlighted text as the file's 'Save As' name. This
is very useful if you have HPFS formatted drives. NOTE: Spaces
and some punctuation characters are converted to "_" characters
in the file name unless the option to use spaces is selected. Then
any converted characters are converted to spaces. The "/" and "\"
characters are deleted and not replaced. This feature is only
activated in the registered version of PMStripper.
The HTML specification defines Character Entity Sets or tags
to represent particular graphic characters which have special
meanings in places in the markup, or may not be part of the
character set available to the writer. PMStripper does not
attempt to scan for all of the possible tags, but does try to
resolve the most common tags.
This version of PMStripper has support for codepages 437 and 850
and if codepage 850 is in use, the 850 character set is used.
The codepages only make a difference when &xxxx; tags are
present in the file. If the correct character or an acceptable
alternate is not available or the tag is unknown to PMStripper,
then the &xxxx; tag will be left in the file.
Only a few of the &#nnn; tags are supported. They do not seem to
be widely used and scanning for all of them will increase the time
it takes to process an .HTML or .HTM file.
VIII. Why & How to Register:
The Word Processor option runs the PMS_CMD.CMD file located in
the working directory specified in the Program Object. This file
is used to start the word processor or editor of your choice to
edit the stripped text file named __TMP2__ or to allow you to
paste the stripped file into your editor. PMStripper will
close after the executing the PMS_CMD.CMD file.
NOTE: The __TMP2__ file is discarded if PMStripper is closed
via the the 'Exit' menu item. Double clicking the PMStripper's
upper left corner, using Alt+F4 or selecting that menu's 'Close'
may cause the temporary stripped file (named __TMP2__ ) to remain
in the working directory.
This menu item is disabled in the unregistered version. Instead
of invoking the command script an unregistered message requiring
a user response will be shown.
Example PMS_CMD.CMD files:
To use the system editor E.EXE, the PMS_CMD.CMD file would
contain:
E __TMP2__
To use a word processor or editor whose executable is not in the
path, the command script must copy the __TMP2__ file to the
desired program's data directory, change to that directory and
then launch the word processor/editor. An example PMS_CMD.CMD
file to use DeScribe is shown below.
copy __TMP2__ g:\describe\__TMP2__
g:
cd \describe
describe __TMP2__
In addition to the activation of the Word Processor option,
the opening unregistered message requiring a user response
is eliminated along with the unregistered line that is
inserted at the top of the stripped file.
Registered users are supported via e-mail. Send help requests and
good ideas to me at dwhawk@southwind.net.
There are two places to register PMStripper. Through BMT Micro and
directly with the author.
Registration through BMT Micro:
BMT Micro will accept credit cards and will be more convenient for
OS/2 users outside the United States. BMT Micro's price to register
PMStripper is $9.95 (US Dollars). BMT Micro also has an FTP area
where the registered version can be obtained after registration.
Direct registration:
Stuff small bills, gold coins, diamonds or even checks (US banks only,
please) valued at $7.50 (US dollars) into an envelope and mail to:
Don Hawkinson
4555 N Hillcrest
Wichita KS, 67220-3832
Please don't send $100 bills (or larger) in the mail without
purchasing full postal insurance. Also, no change will be
returned because it is absolutely unsafe, and unwise, to send
cash through the mail.
The registered version of PMStripper will be distributed by e-mail
in the form of a uuencoded zip file, so make certain that your
e-mail address is included with your registration fee.
Registered users will be notified of updates via e-mail.
IX. Acknowledgments:
Thanks to the following netizens for their help in testing
and helpful comments during development.
DenverD@ibm.net
Emil_Kucera@Environment.gov.MB.CA
fvlaming@netcom.com
jhiatt@ibm.net
jlink@best.com
p_daley@conknet.com
tombeck@usemail.com
Thanks to a net WordSmith (WrdSmth@IBM.net) for editing help.
(Actually he converted my very rough draft to this document.)
Copyrights and trademarks remain the property of their owners.
Don Hawkinson
dwhawk@southwind.net